Stealing Machine Learning Models via Prediction APIs
نویسندگان
چکیده
Machine learning (ML) models may be deemed confidential due to their sensitive training data, commercial value, or use in security applications. Increasingly often, confidential ML models are being deployed with publicly accessible query interfaces. ML-as-a-service (“predictive analytics”) systems are an example: Some allow users to train models on potentially sensitive data and charge others for access on a pay-per-query basis. The tension between model confidentiality and public access motivates our investigation of model extraction attacks. In such attacks, an adversary with black-box access, but no prior knowledge of an ML model’s parameters or training data, aims to duplicate the functionality of (i.e., “steal”) the model. Unlike in classical learning theory settings, ML-as-a-service offerings may accept partial feature vectors as inputs and include confidence values with predictions. Given these practices, we show simple, efficient attacks that extract target ML models with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees. We demonstrate these attacks against the online services of BigML and Amazon Machine Learning. We further show that the natural countermeasure of omitting confidence values from model outputs still admits potentially harmful model extraction attacks. Our results highlight the need for careful ML model deployment and new model extraction countermeasures.
منابع مشابه
Thermal conductivity of Water-based nanofluids: Prediction and comparison of models using machine learning
Statistical methods, and especially machine learning, have been increasingly used in nanofluid modeling. This paper presents some of the interesting and applicable methods for thermal conductivity prediction and compares them with each other according to results and errors that are defined. The thermal conductivity of nanofluids increases with the volume fraction and temperature. Machine learni...
متن کاملThermal conductivity of Water-based nanofluids: Prediction and comparison of models using machine learning
Statistical methods, and especially machine learning, have been increasingly used in nanofluid modeling. This paper presents some of the interesting and applicable methods for thermal conductivity prediction and compares them with each other according to results and errors that are defined. The thermal conductivity of nanofluids increases with the volume fraction and temperature. Machine learni...
متن کاملMachine learning algorithms in air quality modeling
Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...
متن کاملStealing Hyperparameters in Machine Learning
Hyperparameters are critical in machine learning, as different hyperparameters often result in models with significantly different performance. Hyperparameters may be deemed confidential because of their commercial value and the confidentiality of the proprietary algorithms that the learner uses to learn them. In this work, we propose attacks on stealing the hyperparameters that are learnt by a...
متن کاملTime series forecasting of Bitcoin price based on ARIMA and machine learning approaches
Bitcoin as the current leader in cryptocurrencies is a new asset class receiving significant attention in the financial and investment community and presents an interesting time series prediction problem. In this paper, some forecasting models based on classical like ARIMA and machine learning approaches including Kriging, Artificial Neural Network (ANN), Bayesian method, Support Vector Machine...
متن کامل